Abstract: Traffic classification is a process which categorizes computer network traffic according to various parameters (for example, based on port number or protocol) into a number of traffic classes such as Sensitive, Best-Effort, and Undesired etc. Each resulting traffic class can be treated differently in order to differentiate the service implied for the user. Due to growth in Internet users and bandwidth-hungry applications; the amount of Internet traffic data generated is so huge. It requires scalable tools to analyze, measure, and classify this traffic data. Traditional tools fail to do this task due to their limited computational capacity and storage capacity. Hadoop is a distributed framework which performs this task in very efficient manner. Hadoop mainly runs on commodity hardware with distributed storage and process this huge amount of traffic data with a Hive. Hadoop-based Traffic Analysis Measurement and Classification tool which perform Traffic Analysis, Measurement, and Classification with respect to various parameters at packet and flow level. The results can be used by Network Administrator and ISP’s for various usages. Internet traffic measurement and analysis has long been used to characterize network usage and user behaviours, but faces the problem of scalability under the explosive growth of Internet traffic and high-speed access. We proposed a traffic monitoring system that performs IP, ICMP,TCP, HTTP, and UDP analysis of multi-terabytes of Internet traffic in a scalable manner. This can achieve the performance challenges such as accuracy, scalability.

Keywords: Internet Traffic, traffic measurement and analysis, HDFS, HIVE, Qlikview or Tableau.